41 research outputs found
Learning language through pictures
We propose Imaginet, a model of learning visually grounded representations of
language from coupled textual and visual input. The model consists of two Gated
Recurrent Unit networks with shared word embeddings, and uses a multi-task
objective by receiving a textual description of a scene and trying to
concurrently predict its visual representation and the next word in the
sentence. Mimicking an important aspect of human language learning, it acquires
meaning representations for individual words from descriptions of visual
scenes. Moreover, it learns to effectively use sequential structure in semantic
interpretation of multi-word phrases.Comment: To appear at ACL 201
Revisiting the Hierarchical Multiscale LSTM
Hierarchical Multiscale LSTM (Chung et al., 2016a) is a state-of-the-art
language model that learns interpretable structure from character-level input.
Such models can provide fertile ground for (cognitive) computational
linguistics studies. However, the high complexity of the architecture, training
procedure and implementations might hinder its applicability. We provide a
detailed reproduction and ablation study of the architecture, shedding light on
some of the potential caveats of re-purposing complex deep-learning
architectures. We further show that simplifying certain aspects of the
architecture can in fact improve its performance. We also investigate the
linguistic units (segments) learned by various levels of the model, and argue
that their quality does not correlate with the overall performance of the model
on language modeling.Comment: To appear in COLING 2018 (reproduction track
Lessons learned in multilingual grounded language learning
Recent work has shown how to learn better visual-semantic embeddings by
leveraging image descriptions in more than one language. Here, we investigate
in detail which conditions affect the performance of this type of grounded
language learning model. We show that multilingual training improves over
bilingual training, and that low-resource languages benefit from training with
higher-resource languages. We demonstrate that a multilingual model can be
trained equally well on either translations or comparable sentence pairs, and
that annotating the same set of images in multiple language enables further
improvements via an additional caption-caption ranking objective.Comment: CoNLL 201
Adversarial Stylometry in the Wild: Transferable Lexical Substitution Attacks on Author Profiling
Written language contains stylistic cues that can be exploited to
automatically infer a variety of potentially sensitive author information.
Adversarial stylometry intends to attack such models by rewriting an author's
text. Our research proposes several components to facilitate deployment of
these adversarial attacks in the wild, where neither data nor target models are
accessible. We introduce a transformer-based extension of a lexical replacement
attack, and show it achieves high transferability when trained on a weakly
labeled corpus -- decreasing target model performance below chance. While not
completely inconspicuous, our more successful attacks also prove notably less
detectable by humans. Our framework therefore provides a promising direction
for future privacy-preserving adversarial attacks.Comment: Accepted to EACL 202
NeuralREG: An end-to-end approach to referring expression generation
Traditionally, Referring Expression Generation (REG) models first decide on
the form and then on the content of references to discourse entities in text,
typically relying on features such as salience and grammatical function. In
this paper, we present a new approach (NeuralREG), relying on deep neural
networks, which makes decisions about form and content in one go without
explicit feature extraction. Using a delexicalized version of the WebNLG
corpus, we show that the neural model substantially improves over two strong
baselines. Data and models are publicly available.Comment: Accepted for presentation at ACL 201
On the difficulty of a distributional semantics of spoken language
In the domain of unsupervised learning most work on speech has focused on discovering low-level constructs such as phoneme inventories or word-like units. In contrast, for written language, where there is a large body of work on unsupervised induction of semantic representations of words, whole sentences and longer texts. In this study we examine the challenges of adapting these approaches from written to spoken language. We conjecture that unsupervised learning of the semantics of spoken language becomes feasible if we abstract from the surface variability. We simulate this setting with a dataset of utterances spoken by a realistic but uniform synthetic voice. We evaluate two simple unsupervised models which, to varying degrees of success, learn semantic representations of speech fragments. Finally we present inconclusive results on human speech, and discuss the challenges inherent in learning distributional semantic representations on unrestricted natural spoken language
The C-terminal domain of the 2b protein of Cucumber mosaic virus is stabilized by divalent metal ion coordination
The main function of the 2b protein of Cucumber mosaic virus (CMV) is binding permanently the double
stranded siRNA molecules in the suppression process of post-transcriptional gene silencing (PTGS).
The crystal structure of the homologue Tomato aspermy virus (TAV) 2b protein is known, but without
the C-terminal domain. The biologically active form is a tetramer: four 2b protein molecules and two
siRNA duplexes. Regarding the complete 2b protein structure, we performed a molecular dynamics (MD)
simulation of the whole siRNA–2b ribonucleoprotein complex. Unfortunately, the C-terminal domain
is proved to be partially unstructured. Multiple sequence alignment showed a well conserved motif
between residues 94 and 105. The negatively charged residues of the C-terminal domain are supposed
to take part in coordination of a divalent metal ion and stabilize the three-dimensional structure of the
C-terminal domain. MD simulations were performed on the detached C-terminal domains (aa 65–110).
0.15 M MgCl2, CaCl2, FeCl2 and ZnCl2 salt concentrations were used in the screening simulations. Among
the tested divalent metal ions Mg2+ proved to be very successful because Asp95, Asp96 and Asp98 forms
a quasi-permanent Mg2+ binding site. However the control computations have resulted in any (at least)
divalent metal ion remains in the binding site after replacement of the bound Mg2+ ion. A quadruple
mutation (Rs2DDTD/95–98/AAAA) was introduced into the position of the putative divalent metal ion
binding site to analyze the biological relevance of molecular modeling derived hypothesis. The plant
inoculation experiments proved that the movement of the mutant virus is slower and the symptoms
are milder comparing to the wild type virus. These results demonstrate that the quadruple mutation
weakens the stability of the 2b protein tetramer–siRNA ribonucleoprotein complex
Bootstrapping Disjoint Datasets for Multilingual Multimodal Representation Learning
Recent work has highlighted the advantage of jointly learning grounded
sentence representations from multiple languages. However, the data used in
these studies has been limited to an aligned scenario: the same images
annotated with sentences in multiple languages. We focus on the more realistic
disjoint scenario in which there is no overlap between the images in
multilingual image--caption datasets. We confirm that training with aligned
data results in better grounded sentence representations than training with
disjoint data, as measured by image--sentence retrieval performance. In order
to close this gap in performance, we propose a pseudopairing method to generate
synthetically aligned English--German--image triplets from the disjoint sets.
The method works by first training a model on the disjoint data, and then
creating new triples across datasets using sentence similarity under the
learned model. Experiments show that pseudopairs improve image--sentence
retrieval performance compared to disjoint training, despite requiring no
external data or models. However, we do find that using an external machine
translation model to generate the synthetic data sets results in better
performance.Comment: 10 page